In this practical, we are going to explore statistical testing methods for metagenomic data, how to visualize the results, and how to train machine learning models using the SIAMCAT package.
Please note that this document contains the solution to the exercises!
In order to get started, we should first prepare our R environment and load the packages we will need later on. Additionally, the data used in this practical are stored on the EMBL servers and we can set the base path for the downloads.
library("tidyverse") # for general data wrangling and plotting
library("SIAMCAT") # for statistical and ML analyses
data.location <- 'https://embl.de/download/zeller/metaG_course/'
In this practical, we are going to have a look at the data from Zeller et al. MSB 2014. In this study, the authors recruited patients with colorectal cancer (CRC) and healthy controls (CTR) and performed shotgun metagenomic sequencing of fecal samples. The raw data have already been pre-processed and analyzed with the mOTUs2 taxonomic profiler.
First, we are going to load the taxonomic profiles and store them as a matrix.
fn.feat.fr <- paste0(data.location, '/motus_profiles/FR-CRC.motus')
tax.profiles <- read.table(fn.feat.fr, sep = '\t', quote = '',
comment.char = '', skip = 61,
stringsAsFactors = FALSE, check.names = FALSE,
row.names = 1, header = TRUE)
tax.profiles <- as.matrix(tax.profiles)
The taxonomic profiles contain absolute counts and can easily be transformed into relative abundances using the prop.table function:
rel.tax.profiles <- prop.table(tax.profiles, 2)
Additionally, we also need the information which sample belongs to which group. Therefore, we are loading the metadata table as well:
fn.meta.fr <- paste0(data.location, '/metadata/meta_FR-CRC.tsv')
df.meta <- read_tsv(fn.meta.fr)
df.meta
table(df.meta$Group)
##
## ADA CRC CTR NAA
## 15 53 61 27
We are interested in the comparison between control samples (CTR) and colorectal cancer samples (CRC), so we first remove the other samples, which represent advanced adenoma (ADA) or non-advanced adenoma (NAA). Also, we transform the metadata into a data.frame object (which is easier for some analyses later on).
df.meta <- df.meta %>%
filter(Group %in% c('CRC', 'CTR')) %>%
as.data.frame()
rownames(df.meta) <- df.meta$Sample_ID
# restrict the taxonomic profiles to CRC and CTR samples
tax.profiles <- tax.profiles[,rownames(df.meta)]
rel.tax.profiles <- rel.tax.profiles[,rownames(df.meta)]
Currently, the matrix of taxonomic profiles contains 14213 different bacterial species. Of those, not all will be relevant for our question, since some are present only in a handful of samples (low prevalence) or at extremely low abundance. Therefore, it can make sense to filter your taxonomic profiles before you begin the analysis. Here, we could for example use the maximum species abundance as a filtering criterion. All species that have a relative abundance of at least 1e-03 in at least one of the samples will be kept, the rest is filtered out.
species.max.value <- apply(rel.tax.profiles, 1, max)
f.idx <- which(species.max.value > 1e-03)
rel.tax.profiles.filt <- rel.tax.profiles[f.idx,]
Additionally, the mOTUs2 profiler can also estimate how much of the relative abundance cannot be classified. We also filter out this share of “Unknown”.
# unclassified are indicated by -1 in the mOTUs2 profiles
idx.um <- which(rownames(rel.tax.profiles.filt) == '-1')
rel.tax.profiles.filt <- rel.tax.profiles.filt[-idx.um,]
Now that we have set up everyting, we can test all microbial species for statistically significant differences. In order to do so, we perform a Wilcoxon test on each individual bacterial species.
p.vals <- rep_len(1, nrow(rel.tax.profiles.filt))
names(p.vals) <- rownames(rel.tax.profiles.filt)
stopifnot(all(rownames(df.meta) == colnames(rel.tax.profiles.filt)))
for (i in rownames(rel.tax.profiles.filt)){
x <- rel.tax.profiles.filt[i,]
y <- df.meta$Group
t <- wilcox.test(x~y)
p.vals[i] <- t$p.value
}
head(sort(p.vals))
## Fusobacterium nucleatum subsp. animalis [ref_mOTU_v25_01001]
## 1.072932e-07
## Dialister pneumosintes [ref_mOTU_v25_03630]
## 5.047528e-06
## Hungatella hathewayi [ref_mOTU_v25_03436]
## 3.079069e-05
## Olsenella sp. Marseille-P2300 [ref_mOTU_v25_10001]
## 2.283988e-04
## Clostridium sp. [ref_mOTU_v25_03680]
## 2.756465e-04
## [Eubacterium] eligens [ref_mOTU_v25_02375]
## 3.395879e-04
The species with the most significant effect seems to be Fusobacterium nucleatum, so let us take a look at the distribution of this species:
species <- 'Fusobacterium nucleatum subsp. animalis [ref_mOTU_v25_01001]'
df.plot <- tibble(fuso=rel.tax.profiles.filt[species,],
label=df.meta$Group)
df.plot %>%
ggplot(aes(x=label, y=fuso)) +
geom_boxplot() +
xlab('') +
ylab('F. nucleatum rel. ab.')
Let us remember that log-scales are important when visualizing relative abundance data!
df.plot %>%
ggplot(aes(x=label, y=log10(fuso + 1e-05))) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.08) +
xlab('') +
ylab('log10(F. nucleatum rel. ab.)')
We can also use the SIAMCAT R package to test for differential abundance and produce standard visualizations.
library("SIAMCAT")
Within SIAMCAT, the data are stored in the SIAMCAT object which contains the feature matrix, the metadata, and information about the groups you want to compare.
sc.obj <- siamcat(feat=rel.tax.profiles, meta=df.meta,
label='Group', case='CRC')
## + starting create.label
## Label used as case:
## CRC
## Label used as control:
## CTR
## + finished create.label.from.metadata in 0.186 s
## + starting validate.data
## +++ checking overlap between labels and features
## + Keeping labels of 114 sample(s).
## +++ checking sample number per class
## +++ checking overlap between samples and metadata
## + finished validate.data in 0.767 s
We can use SIAMCAT for feature filtering as well:
sc.obj <- filter.features(sc.obj, filter.method = 'abundance', cutoff = 1e-03)
## Features successfully filtered
sc.obj <- filter.features(sc.obj, filter.method = 'prevalence',
cutoff = 0.05, feature.type = 'filtered')
## Features successfully filtered
sc.obj
## siamcat-class object
## label() Label object: 61 CTR and 53 CRC samples
## filt_feat() Filtered features: 839 features after abundance, prevalence filtering
##
## contains phyloseq-class experiment-level object @phyloseq:
## phyloseq@otu_table() OTU Table: [ 14213 taxa and 114 samples ]
## phyloseq@sam_data() Sample Data: [ 114 samples by 13 sample variables ]
Now, we can test the filtered feature for differential abundance with SIAMCAT:
sc.obj <- check.associations(sc.obj, detect.lim = 1e-05,
fn.plot = './associations.pdf')
*** The associations metrics computed by SIAMCAT are stored in the SIAMCAT object and can be extracted by using associations(sc.obj), if you want to have a closer look at the results for yourself. Plot a volcano plot of the associations between cancer and controls using the output from SIAMCAT.**
df.assoc <- associations(sc.obj)
df.assoc %>%
ggplot(aes(x=fc, y=-log10(p.adj))) +
geom_point() +
xlab('Fold change')
Create a ordination plot for our data and colour the samples by group. How would you interpret the results? Try out different ecological distances. How does the choice of distance affect the group separation?
(Tip: make sure to check out the vegdist function in the vegan package and also the pco function in the labdsv package)
First, we can load the vegan and the labdsv packages:
library("vegan")
library("labdsv")
Then, we compute the ecological distances across samples using the vegdist function. By default, it will use the Bray-Curtis distance, but you can play around with other distances as well.
dist.mat <- vegdist(t(rel.tax.profiles.filt))
This distance matrix can be put directly into the pco function:
pco.results <- pco(dist.mat)
Now we can extract the results and merge them with the metadata in order to visualize the results. It also makes sense to include the information what amount of variance is explained by the different axes in the PCO.
# extract points
df.plot <- pco.results$points %>%
as_tibble(rownames = 'Sample_ID') %>%
left_join(df.meta)
## Joining, by = "Sample_ID"
# extract percentages of explained variance
percentage <- pco.results$eig[1:2]/sum(pco.results$eig)
percentage <- percentage*100
percentage <- paste0(sprintf(fmt='%.2f', percentage),'%')
df.plot %>%
ggplot(aes(x=V1, y=V2, col=Group)) +
geom_point() +
xlab(paste0('PCo1 [', percentage[1], ']')) +
ylab(paste0('PCo2 [', percentage[2], ']'))
The SIAMCAT machine learning workflow consists of several steps:
Since we already created a SIAMCAT object and filtered the raw data, we can start directly with the next step.
SIAMCAT offers a few normalization approaches that can be useful for subsequent statistical modeling in the sense that they transform features in a way that can increase the accuracy of the resulting models. Importantly, these normalization techniques do not make use of any label information (patient status), and can thus be applied up front to the whole data set (and outside of the following cross validation).
sc.obj <- normalize.features(sc.obj, norm.method = 'log.std',
norm.param = list(log.n0=1e-05, sd.min.q=0))
## Features normalized successfully.
sc.obj
## siamcat-class object
## label() Label object: 61 CTR and 53 CRC samples
## filt_feat() Filtered features: 839 features after abundance, prevalence filtering
## associations() Associations: Results from association testing
## with 11 significant features at alpha 0.05
## norm_feat() Normalized features: 839 features normalized using log.std
##
## contains phyloseq-class experiment-level object @phyloseq:
## phyloseq@otu_table() OTU Table: [ 14213 taxa and 114 samples ]
## phyloseq@sam_data() Sample Data: [ 114 samples by 13 sample variables ]
Cross validation is a technique to assess how well an ML model would generalize to external data by partionining the dataset into training and test sets. Here, we split the dataset into 10 parts and then train a model on 9 of these parts and use the left-out part to test the model. The whole process is repeated 10 times.
sc.obj <- create.data.split(sc.obj, num.folds = 10, num.resample = 10)
## Features splitted for cross-validation successfully.
Now, we can train a LASSO logistic regression classifier in order to distinguish CRC cases and controls.
sc.obj <- train.model(sc.obj, method='lasso')
## Trained lasso models successfully.
This function will automatically apply the models trained in cross validation to their respective test sets and aggregate the predictions across the whole data set.
sc.obj <- make.predictions(sc.obj)
## Made predictions successfully.
Calling the evaluate.predictions funtion will result in an assessment of precision and recall as well as in ROC analysis, both of which can be plotted as a pdf file using the model.evaluation.plot funtion (the name of/path to the pdf file is passed as an argument).
sc.obj <- evaluate.predictions(sc.obj)
## Evaluated predictions successfully.
model.evaluation.plot(sc.obj, fn.plot = './eval_plot.pdf')
## Plotted evaluation of predictions successfully to: ./eval_plot.pdf
Finally, the model.interpretation.plot function will plot characteristics of the models (i.e. model coefficients or feature importance) alongside the input data aiding in understanding how / why the model works (or not).
model.interpretation.plot(sc.obj, consens.thres = 0.7,
fn.plot = './interpretation_plot.pdf')
On the EMBL cluster, there is another dataset from a colorectal cancer metagenomic study. The study population was recruited in Austria, so you can find the data under:
fn.feat.at <- paste0(data.location, '/motus_profiles/AT-CRC.motus')
fn.meta.at <- paste0(data.location, '/metadata/meta_AT-CRC.tsv')
Apply the trained model on this dataset and check the model performance on the external dataset.
First, we load the external data in the same way as we did for the French dataset and create a SIAMCAT object:
# features
tax.at <- read.table(fn.feat.at, sep = '\t', quote = '',
comment.char = '', skip = 61,
stringsAsFactors = FALSE, check.names = FALSE,
row.names = 1, header = TRUE)
tax.at <- as.matrix(tax.at)
rel.tax.at <- prop.table(tax.at, 2)
# metadata
df.meta.at <- read_tsv(fn.meta.at)
df.meta.at <- df.meta.at %>%
filter(Group %in% c('CRC', 'CTR')) %>%
as.data.frame()
rownames(df.meta.at) <- df.meta.at$Sample_ID
tax.at <- tax.at[,rownames(df.meta.at)]
rel.tax.at <- rel.tax.at[,rownames(df.meta.at)]
sc.obj.at.ext <- siamcat(feat=rel.tax.at, meta=df.meta.at,
label='Group', case='CRC')
## + starting create.label
## Label used as case:
## CRC
## Label used as control:
## CTR
## + finished create.label.from.metadata in 0.003 s
## + starting validate.data
## +++ checking overlap between labels and features
## + Keeping labels of 109 sample(s).
## +++ checking sample number per class
## +++ checking overlap between samples and metadata
## + finished validate.data in 0.43 s
We can use the make.predictions function from SIAMCAT to apply the trained models on the external data:
sc.obj.at.ext <- make.predictions(sc.obj, siamcat.holdout = sc.obj.at.ext)
## Features normalized successfully.
## Made predictions successfully.
Then, we can also evaluate the predictions and create a ROC curve:
sc.obj.at.ext <- evaluate.predictions(sc.obj.at.ext)
## Evaluated predictions successfully.
model.evaluation.plot(training.set=sc.obj, validation.set=sc.obj.at.ext,
fn.plot = './eval_plot_transfer_fr_at.pdf')
## Plotted evaluation of predictions successfully to: ./eval_plot_transfer_fr_at.pdf
Train a SIAMCAT model on the Austrian dataset and apply it to the French dataset. How does the model transfer on the external dataset compare between the two datasets? Compare also the feature weights when training on the French or Austrian dataset.
We can run the complete SIAMCAT pipeline for the Austrian data quite easily:
sc.obj.at <- siamcat(feat=rel.tax.at, meta=df.meta.at,
label='Group', case='CRC')
## + starting create.label
## Label used as case:
## CRC
## Label used as control:
## CTR
## + finished create.label.from.metadata in 0.002 s
## + starting validate.data
## +++ checking overlap between labels and features
## + Keeping labels of 109 sample(s).
## +++ checking sample number per class
## +++ checking overlap between samples and metadata
## + finished validate.data in 0.098 s
sc.obj.at <- filter.features(sc.obj.at,
filter.method = 'abundance', cutoff = 1e-03)
## Features successfully filtered
sc.obj.at <- filter.features(sc.obj.at, filter.method = 'prevalence',
cutoff = 0.05, feature.type = 'filtered')
## Features successfully filtered
sc.obj.at <- normalize.features(sc.obj.at, norm.method = 'log.std',
norm.param = list(log.n0=1e-05, sd.min.q=0))
## Features normalized successfully.
sc.obj.at <- create.data.split(sc.obj.at, num.folds = 10, num.resample = 10)
## Features splitted for cross-validation successfully.
sc.obj.at <- train.model(sc.obj.at, method='lasso')
## Trained lasso models successfully.
sc.obj.at <- make.predictions(sc.obj.at)
## Made predictions successfully.
sc.obj.at <- evaluate.predictions(sc.obj.at)
## Evaluated predictions successfully.
Let us create a new SIAMCAT object for the French data (that does not contain trained models already) and apply the trained Austrian model on this dataset:
sc.obj.fr <- siamcat(feat=rel.tax.profiles, meta=df.meta,
label='Group', case='CRC')
## + starting create.label
## Label used as case:
## CRC
## Label used as control:
## CTR
## + finished create.label.from.metadata in 0.002 s
## + starting validate.data
## +++ checking overlap between labels and features
## + Keeping labels of 114 sample(s).
## +++ checking sample number per class
## +++ checking overlap between samples and metadata
## + finished validate.data in 0.077 s
sc.obj.fr <- make.predictions(sc.obj.at, siamcat.holdout = sc.obj.fr)
## Features normalized successfully.
## Made predictions successfully.
sc.obj.fr <- evaluate.predictions(sc.obj.fr)
## Evaluated predictions successfully.
Finally, we can compare the model transfer again:
model.evaluation.plot(training.set=sc.obj.at, validation.set=sc.obj.fr,
fn.plot = './eval_plot_transfer_at_fr.pdf')
## Plotted evaluation of predictions successfully to: ./eval_plot_transfer_at_fr.pdf
In addition to the taxonomic profiles, we also created functional profiles for the French dataset. You can find it under:
fn.feat.fr.kegg <- paste0(data.location,
'/functional_profiles/KEGG_kos_FR-CRC.tsv')
Explore the distribution of the functional data (abundance distribution, prevalence, etc.) and compare it to what you observe with the taxonomic profiles. Which filtering regime would make sense for functional data?
First, we will load the data:
func.profiles <- read.table(fn.feat.fr.kegg, sep='\t',
stringsAsFactors = FALSE,
check.names = FALSE, quote = '', row.names = 1,
header = TRUE)
func.profiles <- as.matrix(func.profiles)
func.profiles <- func.profiles[,df.meta$Sample_ID]
rel.func.profiles <- prop.table(func.profiles, 2)
Again, this question is a bit open-ended and invites you to explore the data. For example, you could look at the distribution of abundances:
hist(log10(rel.func.profiles), 100, xlab='Relative abundance',
main='KEGG data',
col='slategrey')
hist(log10(rel.tax.profiles), 100, xlab='Relative abundance',
main='mOTUs data',
col='slategrey')
Also, the prevalence of the KEGG KOs could be interesting, especially when compared to the taxonomic abundance profiles
prev.tax <- rowMeans(rel.tax.profiles != 0)
prev.func <- rowMeans(rel.func.profiles != 0)
hist(prev.tax, 50, col='slategrey', main='mOTUs data', xlab='Prevalence')
hist(prev.func, 50, col='slategrey', main='KEGG data', xlab='Prevalence')
Or we could look at the maximum abundance of the KEGG KOs (similarly to how we filtered the taxonomic profiles above):
func.max.value <- apply(rel.func.profiles, 1, max)
hist(log10(func.max.value), 100, col='slategrey',
xlab='Maximum relative abundance', main='KEGG data')
A good cutoff for filtering could for example be 1e-05.
Use SIAMCAT to train a model based on the functional KEGG profiles and compare it to the one trained on taxonomic profiles.
Note Since the functional profiles will have many thousands features, it makes sense to perform feature selection on your dataset. You can supply the parameters for the feature selection to the train.model function in SIAMCAT.
First, let us create a SIAMCAT object for the KEGG data
sc.obj.kegg <- siamcat(feat = rel.func.profiles, meta=df.meta,
label='Group', case='CRC')
## + starting create.label
## Label used as case:
## CRC
## Label used as control:
## CTR
## + finished create.label.from.metadata in 0.001 s
## + starting validate.data
## +++ checking overlap between labels and features
## + Keeping labels of 114 sample(s).
## +++ checking sample number per class
## +++ checking overlap between samples and metadata
## + finished validate.data in 0.52 s
# filter data
sc.obj.kegg <- filter.features(sc.obj.kegg, filter.method = 'abundance',
cutoff=1e-05)
## Features successfully filtered
sc.obj.kegg
## siamcat-class object
## label() Label object: 61 CTR and 53 CRC samples
## filt_feat() Filtered features: 6635 features after abundance filtering
##
## contains phyloseq-class experiment-level object @phyloseq:
## phyloseq@otu_table() OTU Table: [ 9500 taxa and 114 samples ]
## phyloseq@sam_data() Sample Data: [ 114 samples by 13 sample variables ]
Since there are still a lot of functions left after filtering, we can enforce a feature selection scheme within the cross-validation so that the ML models will trained only on a smaller subset of features. What this means is that in every cross-validation fold, SIAMCAT will first compute the generalized fold change (for the training data only) and then select the 200 features with the highest absolute fold change value.
sc.obj.kegg <- normalize.features(sc.obj.kegg, norm.method = 'log.std',
norm.param = list(log.n0=1e-08, sd.min.q=0))
## Features normalized successfully.
sc.obj.kegg <- create.data.split(sc.obj.kegg, num.folds = 10, num.resample = 10)
## Features splitted for cross-validation successfully.
sc.obj.kegg <- train.model(sc.obj.kegg, method='lasso', perform.fs = TRUE,
param.fs = list(thres.fs=200, method.fs='gFC',
direction='absolute'))
## Trained lasso models successfully.
sc.obj.kegg <- make.predictions(sc.obj.kegg)
## Made predictions successfully.
sc.obj.kegg <- evaluate.predictions(sc.obj.kegg)
## Evaluated predictions successfully.
Finally, we can compare the KEGG model to the models based on mOTUs data
model.evaluation.plot(mOTUs=sc.obj, KEGG=sc.obj.kegg,
fn.plot='./eval_plot_func.pdf')
## Plotted evaluation of predictions successfully to: ./eval_plot_func.pdf
You can find more information about SIAMCAT on https://siamcat.embl.de or on Bioconductor under https://www.bioconductor.org/packages/release/bioc/html/SIAMCAT.html
There you can also find several vignettes which go into more detail about different applications for SIAMCAT.
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] DiagrammeR_1.0.6.1 labdsv_2.0-1 mgcv_1.8-33 nlme_3.1-150
## [5] vegan_2.5-6 lattice_0.20-41 permute_0.9-5 SIAMCAT_1.11.0
## [9] phyloseq_1.32.0 mlr_2.18.0 ParamHelpers_1.14 forcats_0.5.0
## [13] stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.4.0
## [17] tidyr_1.1.2 tibble_3.0.4 ggplot2_3.3.2 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 colorspace_1.4-1 ellipsis_0.3.1
## [4] XVector_0.28.0 fs_1.5.0 rstudioapi_0.11
## [7] farver_2.0.3 fansi_0.4.1 lubridate_1.7.9
## [10] xml2_1.3.2 codetools_0.2-18 splines_4.0.2
## [13] PRROC_1.3.1 knitr_1.30 ade4_1.7-16
## [16] jsonlite_1.7.1 pROC_1.16.2 broom_0.7.2
## [19] gridBase_0.4-7 cluster_2.1.0 dbplyr_2.0.0
## [22] compiler_4.0.2 httr_1.4.2 backports_1.2.0
## [25] assertthat_0.2.1 Matrix_1.2-18 cli_2.1.0
## [28] visNetwork_2.0.9 htmltools_0.5.0 prettyunits_1.1.1
## [31] tools_4.0.2 igraph_1.2.6 gtable_0.3.0
## [34] glue_1.4.2 reshape2_1.4.4 LiblineaR_2.10-8
## [37] fastmatch_1.1-0 Rcpp_1.0.5 parallelMap_1.5.0
## [40] Biobase_2.48.0 cellranger_1.1.0 vctrs_0.3.4
## [43] Biostrings_2.56.0 multtest_2.44.0 ape_5.4-1
## [46] iterators_1.0.13 xfun_0.19 rvest_0.3.6
## [49] lifecycle_0.2.0 XML_3.99-0.5 beanplot_1.2
## [52] zlibbioc_1.34.0 MASS_7.3-53 scales_1.1.1
## [55] hms_0.5.3 parallel_4.0.2 biomformat_1.16.0
## [58] rhdf5_2.32.4 RColorBrewer_1.1-2 BBmisc_1.11
## [61] curl_4.3 yaml_2.2.1 gridExtra_2.3
## [64] stringi_1.5.3 S4Vectors_0.26.1 corrplot_0.84
## [67] foreach_1.5.1 checkmate_2.0.0 BiocGenerics_0.34.0
## [70] shape_1.4.5 rlang_0.4.8 pkgconfig_2.0.3
## [73] matrixStats_0.57.0 evaluate_0.14 Rhdf5lib_1.10.1
## [76] htmlwidgets_1.5.2 labeling_0.4.2 tidyselect_1.1.0
## [79] plyr_1.8.6 magrittr_1.5 R6_2.5.0
## [82] IRanges_2.22.2 generics_0.1.0 DBI_1.1.0
## [85] pillar_1.4.6 haven_2.3.1 withr_2.3.0
## [88] survival_3.2-7 modelr_0.1.8 crayon_1.3.4
## [91] rmarkdown_2.5 progress_1.2.2 grid_4.0.2
## [94] readxl_1.3.1 data.table_1.13.2 infotheo_1.2.0
## [97] reprex_0.3.0 digest_0.6.27 stats4_4.0.2
## [100] munsell_0.5.0 glmnet_4.0-2